Probe library construction

ABSTRACT

The present invention generally relates to systems and methods for producing nucleic acids. In some aspects, relatively large quantities of oligonucleotides can be produced, and in some cases, the oligonucleotides may have a variety of different sequences and/or lengths. For instance, a relatively small quantity of oligonucleotides may be amplified to produce a large amount of nucleotides. In one set of embodiments, oligonucleotides may be amplified using PCR, then transcribed to produce RNA. The RNA may then be reverse transcribed to produce DNA, and optionally, the RNA may be selectively degraded or removed, relative to the DNA. In one set of embodiments, the oligonucleotides may be chemically modified. These modifications may include, but are not limited, to the adding of fluorescent dyes or other signaling entities.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 15/329,651, filed Jan. 27, 2017, entitled “Probe Library Construction,” by Zhuang, et al., which is a national stage filing of International Patent Application Serial No. PCT/US2015/042559, filed Jul. 29, 2015, entitled “Probe Library Construction,” by Zhuang, et al., which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/050,636, filed Sep. 15, 2014, entitled “Probe Library Construction,” by Zhuang, et al.; U.S. Provisional Patent Application Ser. No. 62/031,062, filed Jul. 30, 2014, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al.; and U.S. Provisional Patent Application Ser. No. 62/142,653, filed Apr. 3, 2015, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al. Each of the above is incorporated herein by reference.

GOVERNMENT FUNDING

This invention was made with government support under Grant No. GM096450 awarded by the National Institutes of Health. The government has certain rights in the invention.

FIELD

The present invention generally relates to systems and methods for producing nucleic acids.

BACKGROUND

Custom-synthesized, oligonucleotide probes have emerged as a powerful tool for the identification and isolation of specific nucleic acid targets via hybridization. Applications for such hybridization probe sets range from next generation sequencing—where such probes are used to enrich or deplete samples for specific nucleic acid targets—to imaging of fixed samples—where fluorescently labeled hybridization probes allow the direct measurement of the number and spatial organization of the targeted species.

There are now a wide range of commercial sources for such probes. Such probes are often made by synthesizing each oligonucleotide member using standard solid phase synthesis methods. Unfortunately, this limits both the number of probes within a single set and the number of unique sets, due to the requirement that each oligonucleotide member must be individually and separately synthesized.

Recent advances in array-based synthesis of oligonucleotides by several companies have reduced the cost of producing oligonucleotides. However, these approaches also result in 1000-fold less oligonucleotide probes than is required for a single hybridization reaction, thus limiting their usefulness. Accordingly, improvements in oligonucleotide production are needed.

SUMMARY

The present invention generally relates to systems and methods for producing nucleic acids. The subject matter of the present invention involves, in some cases, interrelated products, alternative solutions to a particular problem, and/or a plurality of different uses of one or more systems and/or articles.

In one aspect, the present invention is generally directed to a method. The method, in accordance with one set of embodiments, includes amplifying at least some of a plurality of oligonucleotides using real-time PCR to produce amplified oligonucleotides, transcribing in vitro at least some of the amplified oligonucleotides to produce RNA, reverse transcribing the RNA to produce transcribed DNA, and selectively degrading the RNA relative to the transcribed DNA.

In another set of embodiments, the method includes simultaneously amplifying at least some of a plurality of oligonucleotides in a common solution using PCR to produce amplified oligonucleotides, transcribing in vitro at least some of the amplified oligonucleotides to produce RNA, reverse transcribing the RNA to produce transcribed DNA, and selectively degrading the RNA relative to the transcribed DNA.

In yet another set of embodiments, the method includes acts of providing a plurality of oligonucleotides having an average length of between 10 and 200 nucleotides and including at least 100 unique oligonucleotide sequences, producing amplified oligonucleotides comprising one of the plurality of oligonucleotides and a promoter using real-time PCR, transcribing at least some of the amplified oligonucleotides to produce RNA using an RNA polymerase, reverse transcribing the RNA to produce DNA using a primer comprising a signaling entity, and chemically reducing the RNA.

The method, in still another set of embodiments, includes acts of providing a plurality of oligonucleotides having an average length of between 10 and 200 nucleotides and including at least 100 unique oligonucleotide sequences, producing amplified oligonucleotides in a common solution comprising one of the plurality of oligonucleotides and a promoter using PCR, transcribing at least some of the amplified oligonucleotides to produce RNA using an RNA polymerase, reverse transcribing the RNA to produce DNA using a primer comprising a signaling entity, and chemically reducing the RNA.

In another aspect, the present invention encompasses methods of making one or more of the embodiments described herein, such as oligonucleotides, including but not limited to modified oligonucleotides such as those described herein (e.g., labeled with a signaling entity). In still another aspect, the present invention encompasses methods of using one or more of the embodiments described herein, for example, such as oligonucleotides, including but not limited to modified oligonucleotides such as those described herein (e.g., labeled with a signaling entity).

Other advantages and novel features of the present invention will become apparent from the following detailed description of various non-limiting embodiments of the invention when considered in conjunction with the accompanying figures. In cases where the present specification and a document incorporated by reference include conflicting and/or inconsistent disclosure, the present specification shall control. If two or more documents incorporated by reference include conflicting and/or inconsistent disclosure with respect to each other, then the document having the later effective date shall control.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting embodiments of the present invention will be described by way of example with reference to the accompanying figures, which are schematic and are not intended to be drawn to scale. In the figures, each identical or nearly identical component illustrated is typically represented by a single numeral. For purposes of clarity, not every component is labeled in every figure, nor is every component of each embodiment of the invention shown where illustration is not necessary to allow those of ordinary skill in the art to understand the invention. In the figures:

FIG. 1 illustrates the production of DNA probes, in accordance with one set of embodiments; and

FIG. 2 illustrates a template molecule (SEQ ID NO: 3) produced in accordance with one embodiment of the invention;

FIGS. 3A-3D list optimized primers from the E. coli transcriptome, in another embodiment of the invention (the sequences, from top to bottom and left to right, by page, correspond to SEQ ID NOs: 4-57, 58-111, 112-165, and 166-201); and

FIGS. 4A-4BV show various probes in accordance with yet another embodiment of the invention. The sequences in FIGS. 4A-4BV, from top to bottom and left to right, by page, correspond to SEQ ID NOs: 202-221, 222-243, 244-265, 266-287, 288-309, 310-331, 332-353, 354-375, 376-397, 398-419, 420-441, 442-463, 464-485, 486-507, 508-529, 530-551, 552-573, 574-595, 596-617, 618-639, 640-661, 662-683, 684-705, 706-727, 728-749, 750-771, 772-793, 794-815, 816-837, 838-859, 860-881, 882-903, 904-925, and 926-937.

DETAILED DESCRIPTION

The present invention generally relates to systems and methods for producing nucleic acids. In some aspects, relatively large quantities of oligonucleotides can be produced, and in some cases, the oligonucleotides may have a variety of different sequences and/or lengths. For instance, a relatively small quantity of oligonucleotides may be amplified to produce a large amount of nucleotides. In one set of embodiments, oligonucleotides may be amplified using PCR, then transcribed to produce RNA. The RNA may then be reverse transcribed to produce DNA, and optionally, the RNA may be selectively degraded or removed, relative to the DNA. In one set of embodiments, the oligonucleotides may be chemically modified. These modifications may include, but are not limited, to the adding of fluorescent dyes or other signaling entities.

U.S. Provisional Patent Application Ser. No. 62/031,062, filed Jul. 30, 2014, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al. is incorporated herein by reference in its entirety.

In one aspect, the present invention is generally directed to in vitro methods of amplifying a plurality of oligonucleotides. In some cases, relatively large numbers of unique oligonucelotides within a plurality of oligonucleotides may be amplified. For instance, a plurality of oligonucleotides to be amplified may include 10, 100, 1,000, or more unique sequences.

In addition, in some embodiments, the oligonucleotides may be amplified without selective amplification of some oligonucleotides over others, e.g., due to competitive effects.

Although some drift may occur, it is desired that the relative ratios of the oligonucleotides within a plurality of oligonucleotides stay substantially the same after amplification, at least for some applications. However, in many amplification techniques, due to differences in binding or affinity of different oligonucleotides, some oligonucleotides may be amplified to a greater degree than others, and thus, specific techniques need to be utilized to reduce or eliminate this problem, for example, by separately amplifying each of the oligonucleotides before combining them together to form the plurality of oligonucleotides. In contrast, as is discussed herein, in certain embodiments, a plurality of oligonucleotides can be amplified without causing substantial alterations or changes in the ratios of the oligonucleotides, without requiring separation of the oligonucleotides, separate growth of the oligonucleotides, or other cumbersome techniques.

Referring now to FIG. 1, one example of an embodiment of the invention is now illustrated. In this figure, a plurality of oligonucleotides 10 is provided. This may include 1, 10, 100, 1,000, 10,000, 100,000, or any other suitable number of unique oligonucleotide sequences. Of course, more than one copy of any particular unique oligonucleotide may also be present as well within the plurality of oligonucleotides. The unique oligonucleotides may have the same or different lengths. In some cases, the plurality of oligonucleotides have an overall average length (number average or arithmetic mean) of less than 200 nt (nucleotides), although longer average lengths are also possible in some embodiments.

The plurality of oligonucleotides 10 may initially be amplified, using PCR (polymerase chain reaction) or another suitable oligonucleotide amplification method, to produce a plurality of amplified oligonucleotides 20. In some cases, PCR may be used to generate thousands to millions of copies per oligonucleotide within the plurality of oligonucleotides. In some embodiments, the plurality of oligonucleotides may be amplified while still contained in a common solution, for instance, without requiring separation of the oligonucleotides prior to amplification, e.g., as is required in certain techniques such as emulsion PCR.

Within a common solution, while it is possible that different oligonucleotides of the plurality of oligonucleotides may be amplified at different rates (e.g., leading to non-uniform amplification of the plurality of oligonucleotides, and the potential loss of complexity or species within the plurality of oligonucleotides during amplification), in certain embodiments, this can be reduced or minimized through the use of various oligonucleotide structures and/or through the use of certain types of PCR techniques, as is discussed herein.

As an example, in one set of embodiments, the plurality of oligonucleotides may all be chosen to minimize competitive effects, e.g., as caused by differences in binding or affinity of the oligonucleotides to reagents within the common solution or the preferential enzymatic amplification of some sequence features. For example, in one set of embodiments, the plurality of oligonucleotides may be chosen to have similar lengths and/or sequences.

As is shown in FIG. 1, the plurality of oligonucleotides may each contain one or more index regions 15, 16 on one or both ends of the oligonucleotides that can be recognized by certain reagents. In some cases, the oligonucleotides of the plurality of oligonucleotides may have one or more index regions to which suitable primers can interact with in order to allow PCR or other amplification to occur. In some embodiments, these index regions can be used to selectively produce DNA probes only from subset of the plurality of oligonucleotides 10.

These index regions can also be used in some instances to add additional sequences to that of the plurality of oligonucleotides, e.g., as is shown in FIG. 1, oligonucleotide 11 may include an index region 15, to which a sequence 17 containing a T7 promoter can bind and be introduced into the amplified oligonucleotides (e.g., as region 23). Various sequences may thus be applied to the plurality of oligonucleotides that include a portion able to bind an index region. In some cases, if substantially all of the plurality of oligonucleotides contain similar or identical index regions, the relative affinities or binding to the index regions of the oligonucleotides by enzymes such as polymerases may be substantially similar or identical, which may allow for relatively uniform amplification to occur. The plurality of oligonucleotides may also contain other different regions that can be varied to produce a plurality of unique oligonucleotides, e.g., region 12. These regions can vary in terms of length and/or sequence, etc.

In some embodiments, the amount of amplification that occurs may be carefully controlled by monitoring the PCR amplification reaction, e.g., using techniques such as real-time PCR. This may occur, for example, using oligonucleotides having common index regions, substantially similar lengths and/or sequences, etc., including those previously discussed, or with other suitable pluralities of oligonucleotides. For instance, in some embodiments, the PCR reaction may be monitored by illuminating the solution containing the oligonucleotides with suitable light and determining the amount of fluorescence that is present, which can be related to the DNA present within the sample. Techniques for monitoring PCR reactions, such as real-time PCR methodologies, are known to those of ordinary skill in the art. The PCR reaction can also be controlled, in some embodiments, by controlling the amount and/or concentration of nucleotides and/or cofactors that are present.

After amplification, the plurality of amplified oligonucleotides 20 may be transcribed to produce a plurality of RNAs 30, as is shown in the example of FIG. 1. This may be performed, for example, by exposing the amplified oligonucleotides to a suitable RNA polymerase, such as T7 RNA polymerase, that can transcribe the oligonucleotides to produce corresponding RNA. The amount of RNA production can be controlled in some embodiments by controlling the amount and/or concentration of nucleotides and/or cofactors that are present as well as the duration of the in vitro transcription reaction.

The plurality of RNAs 30, may then be used to produce additional amounts of DNA 40, e.g., by using reverse transcription. For example a suitable enzyme, such as reverse transcriptase, may be used to perform the reverse transcription. In some cases, primers may be used to facilitate transcription, and in some embodiments, the primers may also be used to attach additional entities to the DNA. For example, signaling entities may be attached, as is shown with signaling entity 48 in FIG. 1. Alternatively, additional nucleic acid sequences can also be attached, which can serve to recruit additional oligonucleotides via Watson-Crick base-pairing. The amount of DNA that is produced can be controlled, for example, by controlling the amount and/or concentration of nucleotides and/or cofactors that are present as well as the duration and temperature of the reverse transcription reaction.

In some embodiments, multiple copies of DNA may be produced from each RNA molecule. In addition, optionally, the RNA may then be removed or selectively degraded, relative to the DNA, for example, through alkaline hydrolysis, enzymatic digestion, or other techniques.

Accordingly, in certain aspects, the present invention is generally directed to systems and methods of amplifying a plurality of oligonucleotides. In one set of embodiments, relatively large quantities or masses of oligonucleotides can be produced as is discussed herein, e.g., at least about 10⁻³ pmol, at least about 10⁻² pmol, at least about 10⁻¹ pmol, at least about 10⁰ pmol, at least about 10¹ pmol, at least about 10² pmol, at least about 10³ pmol, etc. In addition, in some embodiments, the plurality of oligonucleotides may be substantially diverse. For example, the plurality of oligonucleotides may include at least about 10¹, at least about 10², at least about 10³, at least about 10⁴, at least about 10⁵, or at least about 10⁶ unique sequences of oligonucleotides, even after amplification to the amounts discussed above. (It should also be noted that a plurality or population of oligonucleotides may include more than one copy of a given unique oligonucleotide sequence.) In contrast, certain prior art techniques are able to amplify large numbers of unique oligonucleotides, but to only small quantities or masses (e.g., to amounts of around 10⁻³ pmol or less), or are able to produce large quantities or masses of oligonucleotides, but only for 1 or a few unique sequences (e.g., less than 10 sequences).

As discussed, in certain embodiments, a plurality of oligonucleotides, which may include a plurality of unique sequences of oligonucleotides such as those described above, may be amplified without substantial selective amplification of some oligonucleotide sequences over others, e.g., due to competitive effects, unlike in many prior art techniques. Although some drift may occur during the amplification process, the drift may be relatively small. For example, in certain embodiments, the ratios or percentages of a representative unique oligonucleotide sequence, relative to the starting overall population, on the average, may change upon amplification by no more than about 10%, no more than about 8%, no more than about 6%, no more than about 5%, no more than about 4%, no more than about 3%, no more than about 2%, no more than about 1%, etc., relative to the starting ratio or percentage of the oligonucleotide sequence, prior to amplification. However, it should be noted that the oligonucleotide sequence itself, prior to any amplification, may also exhibit some variability, which is not included in the above numbers.

The unique oligonucleotides within a plurality of oligonucleotides may have the same or different lengths. If more than one unique oligonucleotide is present, then the unique oligonucleotides may independently have the same or different lengths. For example, in some cases, a plurality of oligonucleotides may have an average length (number average) of at least 5, at least 10, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 50, at least 60, at least 65, at least 75, at least 100, at least 125, at least 150, at least 175, at least 200, at least 250, at least 300, at least 350, at least 400, or at least 450 nucleotides. In some cases, the average length may be no more than 500, no more than 450, no more than 400, no more than 350, no more than 300, no more than 250, no more than 200, no more than 175, no more than 150, no more than 125, no more than 100, be no more than 75, no more than 60, no more than 65, no more than 60, no more than 55, no more than 50, no more than 45, no more than 40, no more than 35, no more than 30, no more than 20, or no more than 10 nucleotides. Combinations of any of these are also possible, e.g., the average length may be between 10 and 30 nucleotides, between 20 and 40 nucleotides, between 5 and 50 nucleotides, between 10 and 200 nucleotides, or between 25 and 35 nucleotides, between 10 and 300 nucleotides, etc.

In one set of embodiments, any suitable technique may be used to amplify the plurality of oligonucleotides. In some cases, for each oligonucleotide to be amplified, at least about 100, at least about 300, at least about 500, at least about 1,000, at least about 3,000, at least about 5,000, at least about 10,000, at least about 30,000, at least about 50,000 at least about 100,000, at least about 300,000, at least about 500,000, at least about 1,000,000 copies, at least about 3,000,000 copies, at least about 5,000,000 copies, at least about 10,000,000 copies, at least about 30,000,000 copies, at least about 50,000,000 copies, or at least about 100,000,000 copies of the oligonucleotide may be produced using any of the amplification techniques discussed herein (e.g., including PCR amplification, in vitro transcription etc.). As discussed, in some cases, the amplification may occur without substantial selective amplification of some oligonucleotide sequences over others.

Any suitable technique may be used to generate the plurality of oligonucleotides. For example, the plurality of oligonucleotides may be synthetically produced, grown within a cell, grown on a substrate (e.g., in an array), or the like. Techniques for producing oligonucleotides are known to those of ordinary skill in the art. The plurality of oligonucleotides may also be computationally designed in some embodiments.

In one embodiment, the oligonucleotides may be amplified using PCR (polymerase chain reaction). In some cases, the oligonucleotides may be amplified while contained in a common liquid or solution. This is to be contrasted with certain PCR techniques, such as emulsion PCR or digital PCR, which requires separation of the oligonucleotides, e.g., into separate compartments or droplets, prior to amplification so as to prevent relatively selective amplification of certain oligonucleotides from occurring. However, surprisingly, it has been found that such separation is not required, and other techniques (such as is described herein) may be used to prevent or reduce selective amplification while keeping the oligonucleotides together within a common solution.

As mentioned, in some cases, by using certain oligonucleotide structures and/or certain types of PCR techniques, the amount of selective amplification that may occur may be reduced or eliminated. For instance, in one set of embodiments, oligonucleotides are chosen to minimize competitive effects. For instance, the oligonucleotides may have substantially the same lengths, and/or share identical or similar portions or regions.

For example, in one set of embodiments, the oligonucleotides may have a distribution of lengths such that no more than about 10%, no more than about 5%, no more than about 3%, or no more than about 1% of the oligonucleotides has a length that is less than about 80% or greater than about 120%, less than about 90% or greater than about 110%, or less than about 95% or greater than about 105% of the overall average length of the plurality of nucleotides.

In another set of embodiments the oligonucleotides may share one or more regions, such as index regions, that are identical or substantially similar. The oligonucleotides sharing index or other regions may have substantially the same lengths, as discussed above, or different lengths. In some embodiments, the oligonucleotides may comprise at least two index regions that each are identical or substantially similar, surrounding a variable region having different nucleotide sequences, and optionally, different lengths. For example, the oligonucleotides may include, in sequence, a first region that is identical or substantially similar to the other oligonucleotides, a second region that is not identical, and optionally, a third region that is identical or substantially similar to the other oligonucleotides. In some embodiments, competition of oligonucleotides may be controlled by using oligonucleotides selected to reduce amplification bias. For instance, in some cases, groups of oligonucleotides that have similar compositions may be amplified together.

In some cases, the index regions may have a length of greater than 5, 7, 10, 12, 14, 16, 18, or 20 nucleotides, and/or have a length of less than 30, 28, 25, 22, 20, 18, 16, 14, 12, or 10 nucleotides. For instance, the regions that are identical or substantially similar may have a length of between 18 and 22 nucleotides. The regions may be identical, or differ by no more than 1, 2, 3, 4, or 5 nucleotides (consecutively or non-consecutively) within the region.

In certain embodiments, primer sequences may be added to facilitate the PCR reaction. For example, the primer sequence may include sequences substantially complementary to a region within the oligonucleotides, e.g., an index region. A variety of such sequences suitable for PCR or in vitro transcription may be readily obtained commercially.

In some embodiments, the primer sequence may also include other sequences, e.g., promoter sequences or other sequences that may be added to the oligonucleotide during PCR amplification, such as is shown in FIG. 1 with a T7 promoter. Accordingly, oligonucleotides comprising the original sequence and one or more promoter sequences may be produced in certain cases. Besides the T7 promoter, other suitable promoters that may be used include, but are not limited to, T3 promoters or SP6 promoters. Such promoters may be useful, for example, to facilitate transcription to produce RNA, as is discussed in more detail below. In addition, in some embodiments, more than one promoter may be added.

In one set of embodiments, more than one sequence containing a PCR primer may be used, e.g., to amplify different subsets of the plurality of oligonucleotides. If more than one primer-containing sequence is used, the PCR primers contained on each of them may be the same or different. Examples of suitable PCR primers include those described herein. Thus, for example, in one set of embodiments, for example, the plurality of oligonucleotides may include different subpools having different index regions or other regions as discussed above, which may be selectively amplified through the use of an appropriate sequence including a PCR primer. Any suitable number of subpools may be created for a plurality of oligonucleotides. For example, at least 1, 2, 4, 10, 20, 96, 100, or 192 subpools of the plurality oligonucleotides may be selective amplified, through the use of specific PCR primers.

Thus, in some cases, the oligonucleotides may be formed into “pools” or groups or sets of oligonucleotides within the plurality of oligonucleotides that share one or more common features, such as an index region or other identical sequence. For instance, the common feature in a group or set of oligonucleotides may have an identical sequence of nucleotides of at least 5, 7, 10, 12, 14, 16, 18, or 20 nucleotides, and/or less than 30, 28, 25, 22, 20, 18, 16, 14, 12, or 10 nucleotides. For instance, the common region that is identical or substantially similar in a group of oligonucleotides may have a length of between 18 and 22 nucleotides. The common regions may be identical, or differ by no more than 1, 2, 3, 4, or 5 nucleotides (consecutively or non-consecutively) within the region. In some embodiments, each group or pool may contain two (or more) unique index regions that are not used in any other pool, e.g., to reduce the contamination of off-target amplified products from the amplification products of another.

In certain embodiments, PCR amplification may be monitored, and controlled to reduce or minimize selective amplification. For example, in one set of embodiments, real-time PCR techniques may be used. In some embodiments, the extent of the PCR reaction may be monitored or controlled, for example, by illuminating the solution containing the oligonucleotides with suitable light and determining the amount of fluorescence that is present to monitor the PCR reaction. Accordingly, for example, the reaction conditions may be controlled such that the oligonucleotides react in conditions that minimize the amount of selective amplification, for example, by providing an excess of nucleotides, ions (e.g., Mg²⁺), enzyme, etc. Once the oligonucleotide concentrations have reached the point where competitive effects may start to occur, the reaction may be stopped before significant selective amplification begins.

After amplification as discussed above, the amplified oligonucleotides may then be transcribed to produce RNA. Further amplification may also occur in this step. For instance, in some cases, each oligonucleotide can be used to produce, on the average, at least about 50, at least about 100, at least about 300, at least about 500, at least about 1,000, at least about 3,000, at least about 5,000, at least about 10,000, at least about 30,000, at least about 50,000 at least about 100,000, at least about 300,000, at least about 500,000 or at least about 1,000,000 transcribed RNA molecules. In some cases, the mass of RNA that is produced may be at least about 10, at least about 20, at least about 30, at least about 50, at least about 100, at least about 200, at least about 300, or at least about 500 times the mass of the oligonucleotides. Thus, for example, one microgram of oligonucleotides may be converted into at least 10 micrograms, at least 30 micrograms, or at least 100 micrograms of RNA.

In one set of embodiments, transcription may occur in vitro by exposing the amplified oligonucleotides to a suitable RNA polymerase. A variety of RNA polymerases are available commercially, including T7, T3, or SP6 RNA polymerases. Other non-limiting examples of RNA polymerases include RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, or RNA polymerase V. The RNA polymerase may arise from any suitable source, e.g., bacteria, viruses, or eukaryotes. In some embodiments, more than one RNA polymerase may be used. In addition, as previously discussed, in some embodiments, the amplified oligonucleotides may include promoter sequences, such as one or more of T7, T3, or SP6 promoter sequences, that can be used to facilitate the transcription process. Those of ordinary skill in the art will be aware of suitable conditions for causing transcription in vitro using RNA polymerases.

In some embodiments, the total amplification bias may be reduced by changing the relative amount of amplification produced by the PCR and the in vitro transcription. For example, the PCR can be used to produce smaller amounts of DNA than are typically produced in a PCR, to reduce the amplification bias of this process. However, this reduced yield can be compensated in some cases by increasing the duration of the in vitro transcription reaction.

The RNA may, in turn, be reverse transcribed to produce DNA. In one set of embodiments, reverse transcription may occur by exposing the RNA to a suitable reverse transcriptase enzyme. In some cases, the reverse transcriptase may be a viral reverse transcriptase, e.g., M-MLV reverse transcriptase, AMV reverse transcriptase, or the like. A variety of reverse transcriptase enzymes are commercially available. Those of ordinary skill in the art will be aware of suitable conditions for causing reverse transcription to occur.

In certain embodiments, reverse transcription may be facilitated through the use of primer-containing sequences, e.g., containing primers for reverse transcription. In some cases, the primer-containing sequences may contain other sequences or entities as well, although this is not necessarily a requirement. The primer-containing sequences may be added at any suitable point, e.g., just before starting transcription reaction. Suitable transcription primers for conducting reverse transcription may be obtained commercially.

In one set of embodiments, the primer-containing sequence may be incorporated into the DNA during production of the DNA by the reverse transcriptase. In some cases, the primer-containing sequence may contain other entities, and/or sequences suitable for attaching other entities (e.g., on the 5′ or 3′ ends, internally, etc.). For instance, the primer-containing sequence may contain a non-nucleic acid moiety, such as a digoxigenin moiety, a biotin moiety, etc. located on the 5′ end, the 3′ end, internally, or the like. In some cases, the signaling entity that can be subsequently detected or determined may be introduced to the DNA. For instance, the signaling entity may be fluorescent, or a specific nucleotide sequence that can be determined, e.g., enzymatically. Examples of signaling entities are discussed in more detail below.

In some cases, the RNA may be purified prior to reverse transcription. However, it should be noted that purification is not required, and in other embodiments, the RNA may be reverse transcribed to form DNA without any intermediate purification steps. If the RNA is purified, it may be purified using any suitable technique, e.g., by passing the RNA over a suitable column to remove oligonucleotides.

Optionally, in some embodiments, the RNA may be separated from the DNA or the DNA may be purified in some fashion. For example, the RNA may be selectively degraded, relative to the DNA. In one set of embodiments, the RNA may be degraded relative to the DNA by alkaline hydrolysis. For instance, the pH of the solution may be raised to at least about 8, at least about 9, at least about 10, etc. Any suitable alkaline may be used to raise the pH. In some cases, after degradation of the RNA, the pH may also be lowered, e.g., to about 7, to about 7.4, to physiological conditions, or the like. In some cases, techniques such as enzymatic degradation can be used to selectively degrade RNA, relative to DNA. The DNA may also be purified using techniques such as column purification, ethanol precipitation, and/or solid-phase reversible immobilization techniques. In addition, in some cases, the DNA may be concentrated, e.g., through evaporation techniques.

In addition, techniques such as those described above may be scaled-up or “numbered-up” to produce larger quantities of material. For example, a process may be repeated using multi-well techniques or by simultaneously running multiple reactions in parallel, etc. to produce larger quantities or masses of oligonucleotides. As a non-limiting example, in one embodiment, processes such as those discussed herein may be performed using multiple wells of a microtiter plate (e.g., having 96, 384, 1536, wells, etc.) to increase output.

The DNA may be used for a variety of purposes, in different embodiments of the invention. For example, in certain embodiments, the DNA may be hybridized to nucleic acid species in liquid samples, e.g., extracted from a variety of biological sources, including human. In some cases, the DNA may be used to physically separate one set of nucleic acids from another, or as primers for PCR or reverse transcription.

In addition, as previously discussed, in certain aspects, signaling entities are incorporated into DNA in some embodiments. The signaling entities may be determined for a variety of purposes. For example, the DNA that is produced may be used as a biological probe, and the signaling entities may be determined in some fashion, e.g., quantitatively or qualitatively, to determine a characteristic or feature of the probe. Examples include, but are not limited to, the position of the probe, the activity of the probe, the concentration of the probe, or the like.

In some cases, signaling entities within a sample may be determined, e.g., spatially, using a variety of techniques. In some embodiments, the signaling entities may be fluorescent, and techniques for determining fluorescence within a sample, such as fluorescence microscopy or confocal microscopy, may be used to spatially identify the positions of signaling entities within a cell. In some cases, the positions of the entities within the sample may be determined in two or even three dimensions.

In some embodiments, the spatial positions of the signaling entities may be determined at relatively high resolutions. For instance, the positions may be determined at spatial resolutions of better than about 100 micrometers, better than about 30 micrometers, better than about 10 micrometers, better than about 3 micrometers, better than about 1 micrometer, better than about 800 nm, better than about 600 nm, better than about 500 nm, better than about 400 nm, better than about 300 nm, better than about 200 nm, better than about 100 nm, better than about 90 nm, better than about 80 nm, better than about 70 nm, better than about 60 nm, better than about 50 nm, better than about 40 nm, better than about 30 nm, better than about 20 nm, or better than about 10 nm, etc.

There are a variety of techniques able to determine or image the spatial positions of entities optically, e.g., using fluorescence microscopy. In some cases, the spatial positions may be determined at super resolutions, or at resolutions better than the wavelength of light. Non-limiting examples include STORM (stochastic optical reconstruction microscopy), STED (stimulated emission depletion microscopy), NSOM (Near-field Scanning Optical Microscopy), 4Pi microscopy, SIM (Structured Illumination Microscopy), SMI (Spatially Modulated Illumination) microscopy, RESOLFT (Reversible Saturable Optically Linear Fluorescence Transition Microscopy), GSD (Ground State Depletion Microscopy), SSIM (Saturated Structured-Illumination Microscopy), SPDM (Spectral Precision Distance Microscopy), Photo-Activated Localization Microscopy (PALM), Fluorescence Photoactivation Localization Microscopy (FPALM), LIMON (3D Light Microscopical Nanosizing Microscopy), Super-resolution optical fluctuation imaging (SOFI), or the like. See, e.g., U.S. Pat. No. 7,838,302, issued Nov. 23, 2010, entitled “Sub-Diffraction Limit Image Resolution and Other Imaging Techniques,” by Zhuang, et al.; U.S. Pat. No. 8,564,792, issued Oct. 22, 2013, entitled “Sub-diffraction Limit Image Resolution in Three Dimensions,” by Zhuang, et al.; or Int. Pat. Apl. Pub. No. WO 2013/090360, published Jun. 20, 2013, entitled “High Resolution Dual-Objective Microscopy,” by Zhuang, et al., each incorporated herein by reference in their entireties.

In addition, the signaling entity may be inactivated in some cases. For example, in some embodiments, a first secondary nucleic acid probe containing a signaling entity may be applied to a sample that can recognize a first read sequence, then the first secondary nucleic acid probe can be inactivated before a second secondary nucleic acid probe is applied to the sample. If multiple signaling entities are used, the same or different techniques may be used to inactivate the signaling entities, and some or all of the multiple signaling entities may be inactivated, e.g., sequentially or simultaneously.

Inactivation may be caused by removal of the signaling entity (e.g., from the sample, or from the nucleic acid probe, etc.), and/or by chemically altering the signaling entity in some fashion, e.g., by photobleaching the signaling entity, bleaching or chemically altering the structure of the signaling entity, etc.). For instance, in one set of embodiments, a fluorescent signaling entity may be inactivated by chemical or optical techniques such as oxidation, photobleaching, chemically bleaching, stringent washing or enzymatic digestion or reaction by exposure to an enzyme, dissociating the signaling entity from other components (e.g., a probe), chemical reaction of the signaling entity (e.g., to a reactant able to alter the structure of the signaling entity) or the like.

In some embodiments, various nucleic acid probes (including primary and/or secondary nucleic acid probes) may include one or more signaling entities. If more than one nucleic acid probe is used, the signaling entities may each by the same or different. In certain embodiments, a signaling entity is any entity able to emit light. For instance, in one embodiment, the signaling entity is fluorescent. In other embodiments, the signaling entity may be phosphorescent, radioactive, absorptive, etc. In some cases, the signaling entity is any entity that can be determined within a sample at relatively high resolutions, e.g., at resolutions better than the wavelength of visible light. The signaling entity may be, for example, a dye, a small molecule, a peptide or protein, or the like. The signaling entity may be a single molecule in some cases. If multiple secondary nucleic acid probes are used, the nucleic acid probes may comprise the same or different signaling entities.

Non-limiting examples of signaling entities include fluorescent entities (fluorophores) or phosphorescent entities, for example, cyanine dyes (e.g., Cy2, Cy3, Cy3B, Cy5, Cy5.5, Cy7, etc.), Alexa Fluor dyes, Atto dyes, photoswtichable dyes, photoactivatable dyes, fluorescent dyes, metal nanoparticles, semiconductor nanoparticles or “quantum dots”, fluorescent proteins such as GFP (Green Fluorescent Protein), or photoactivabale fluorescent proteins, such as PAGFP, PSCFP, PSCFP2, Dendra, Dendra2, EosFP, tdEos, mEos2, mEos3, PAmCherry, PAtagRFP, mMaple, mMaple2, and mMaple3. Other suitable signaling entities are known to those of ordinary skill in the art. See, e.g., U.S. Pat. No. 7,838,302 or U.S. Pat. Apl. Ser. No. 61/979,436, each incorporated herein by reference in its entirety.

As used herein, the term “light” generally refers to electromagnetic radiation, having any suitable wavelength (or equivalently, frequency). For instance, in some embodiments, the light may include wavelengths in the optical or visual range (for example, having a wavelength of between about 400 nm and about 700 nm, i.e., “visible light”), infrared wavelengths (for example, having a wavelength of between about 300 micrometers and 700 nm), ultraviolet wavelengths (for example, having a wavelength of between about 400 nm and about 10 nm), or the like. In certain cases, as discussed in detail below, more than one entity may be used, i.e., entities that are chemically different or distinct, for example, structurally. However, in other cases, the entities may be chemically identical or at least substantially chemically identical.

In one set of embodiments, the signaling entity is “switchable,” i.e., the entity can be switched between two or more states, at least one of which emits light having a desired wavelength. In the other state(s), the entity may emit no light, or emit light at a different wavelength. For instance, an entity may be “activated” to a first state able to produce light having a desired wavelength, and “deactivated” to a second state not able to emit light of the same wavelength. An entity is “photoactivatable” if it can be activated by incident light of a suitable wavelength. As a non-limiting example, Cy5, can be switched between a fluorescent and a dark state in a controlled and reversible manner by light of different wavelengths, i.e., 633 nm (or 642 nm, 647 nm, 656 nm) red light can switch or deactivate Cy5 to a stable dark state, while 405 nm green light can switch or activate the Cy5 back to the fluorescent state. In some cases, the entity can be reversibly switched between the two or more states, e.g., upon exposure to the proper stimuli. For example, a first stimuli (e.g., a first wavelength of light) may be used to activate the switchable entity, while a second stimuli (e.g., a second wavelength of light) may be used to deactivate the switchable entity, for instance, to a non-emitting state. Any suitable method may be used to activate the entity. For example, in one embodiment, incident light of a suitable wavelength may be used to activate the entity to emit light, i.e., the entity is “photoswitchable.” Thus, the photoswitchable entity can be switched between different light-emitting or non-emitting states by incident light, e.g., of different wavelengths. The light may be monochromatic (e.g., produced using a laser) or polychromatic. In another embodiment, the entity may be activated upon stimulation by electric field and/or magnetic field. In other embodiments, the entity may be activated upon exposure to a suitable chemical environment, e.g., by adjusting the pH, or inducing a reversible chemical reaction involving the entity, etc. Similarly, any suitable method may be used to deactivate the entity, and the methods of activating and deactivating the entity need not be the same. For instance, the entity may be deactivated upon exposure to incident light of a suitable wavelength, or the entity may be deactivated by waiting a sufficient time.

Typically, a “switchable” entity can be identified by one of ordinary skill in the art by determining conditions under which an entity in a first state can emit light when exposed to an excitation wavelength, switching the entity from the first state to the second state, e.g., upon exposure to light of a switching wavelength, then showing that the entity, while in the second state can no longer emit light (or emits light at a much reduced intensity) when exposed to the excitation wavelength.

In one set of embodiments, as discussed, a switchable entity may be switched upon exposure to light. In some cases, the light used to activate the switchable entity may come from an external source, e.g., a light source such as a laser light source, another light-emitting entity proximate the switchable entity, etc. The second, light emitting entity, in some cases, may be a fluorescent entity, and in certain embodiments, the second, light-emitting entity may itself also be a switchable entity.

In some embodiments, the switchable entity includes a first, light-emitting portion (e.g., a fluorophore), and a second portion that activates or “switches” the first portion. For example, upon exposure to light, the second portion of the switchable entity may activate the first portion, causing the first portion to emit light. Examples of activator portions include, but are not limited to, Alexa Fluor 405 (Invitrogen), Alexa Fluor 488 (Invitrogen), Cy2 (GE Healthcare), Cy3 (GE Healthcare), Cy3B (GE Healthcare), Cy3.5 (GE Healthcare), or other suitable dyes. Examples of light-emitting portions include, but are not limited to, Cy5, Cy5.5 (GE Healthcare), Cy7 (GE Healthcare), Alexa Fluor 647 (Invitrogen), Alexa Fluor 680 (Invitrogen), Alexa Fluor 700 (Invitrogen), Alexa Fluor 750 (Invitrogen), Alexa Fluor 790 (Invitrogen), DiD, DiR, YOYO-3 (Invitrogen), YO-PRO-3 (Invitrogen), TOT-3 (Invitrogen), TO-PRO-3 (Invitrogen) or other suitable dyes. These may linked together, e.g., covalently, for example, directly, or through a linker, e.g., forming compounds such as, but not limited to, Cy5-Alexa Fluor 405, Cy5-Alexa Fluor 488, Cy5-Cy2, Cy5-Cy3, Cy5-Cy3.5, Cy5.5-Alexa Fluor 405, Cy5.5-Alexa Fluor 488, Cy5.5-Cy2, Cy5.5-Cy3, Cy5.5-Cy3.5, Cy7-Alexa Fluor 405, Cy7-Alexa Fluor 488, Cy7-Cy2, Cy7-Cy3, Cy7-Cy3.5, Alexa Fluor 647-Alexa Fluor 405, Alexa Fluor 647-Alexa Fluor 488, Alexa Fluor 647-Cy2, Alexa Fluor 647-Cy3, Alexa Fluor 647-Cy3.5, Alexa Fluor 750-Alexa Fluor 405, Alexa Fluor 750-Alexa Fluor 488, Alexa Fluor 750-Cy2, Alexa Fluor 750-Cy3, or Alexa Fluor 750-Cy3.5. Those of ordinary skill in the art will be aware of the structures of these and other compounds, many of which are available commercially. The portions may be linked via a covalent bond, or by a linker, such as those described in detail below. Other light-emitting or activator portions may include portions having two quaternized nitrogen atoms joined by a polymethine chain, where each nitrogen is independently part of a heteroaromatic moiety, such as pyrrole, imidazole, thiazole, pyridine, quinoine, indole, benzothiazole, etc., or part of a nonaromatic amine. In some cases, there may be 5, 6, 7, 8, 9, or more carbon atoms between the two nitrogen atoms.

In certain cases, the light-emitting portion and the activator portions, when isolated from each other, may each be fluorophores, i.e., entities that can emit light of a certain, emission wavelength when exposed to a stimulus, for example, an excitation wavelength. However, when a switchable entity is formed that comprises the first fluorophore and the second fluorophore, the first fluorophore forms a first, light-emitting portion and the second fluorophore forms an activator portion that switches that activates or “switches” the first portion in response to a stimulus. For example, the switchable entity may comprise a first fluorophore directly bonded to the second fluorophore, or the first and second entity may be connected via a linker or a common entity. Whether a pair of light-emitting portion and activator portion produces a suitable switchable entity can be tested by methods known to those of ordinary skills in the art. For example, light of various wavelength can be used to stimulate the pair and emission light from the light-emitting portion can be measured to determined wither the pair makes a suitable switch.

As a non-limiting example, Cy3 and Cy5 may be linked together to form such an entity. In this example, Cy3 is an activator portion that is able to activate Cy5, the light-emission portion. Thus, light at or near the absorption maximum (e.g., near 532 nm light for Cy3) of the activation or second portion of the entity may cause that portion to activate the first, light-emitting portion, thereby causing the first portion to emit light (e.g., near 647 nm for Cy5). See, e.g., U.S. Pat. No. 7,838,302, incorporated herein by reference in its entirety. In some cases, the first, light-emitting portion can subsequently be deactivated by any suitable technique (e.g., by directing 647 nm red light to the Cy5 portion of the molecule).

Other non-limiting examples of potentially suitable activator portions include 1,5 IAEDANS, 1,8-ANS, 4-Methylumbelliferone, 5-carboxy-2,7-dichlorofluorescein, 5-Carboxyfluorescein (5-FAM), 5-Carboxynapthofluorescein, 5-Carboxytetramethylrhodamine (5-TAMRA), 5-FAM (5-Carboxyfluorescein), 5-HAT (Hydroxy Tryptamine), 5-Hydroxy Tryptamine (HAT), 5-ROX (carboxy-X-rhodamine), 5-TAMRA (5-Carboxytetramethylrhodamine), 6-Carboxyrhodamine 6G, 6-CR 6G, 6-JOE, 7-Amino-4-methylcoumarin, 7-Aminoactinomycin D (7-AAD), 7-Hydroxy-4-methylcoumarin, 9-Amino-6-chloro-2-methoxyacridine, AB Q, Acid Fuchsin, ACMA (9-Amino-6-chloro-2-methoxyacridine), Acridine Orange, Acridine Red, Acridine Yellow, Acriflavin, Acriflavin Feulgen SITSA, Alexa Fluor 350, Alexa Fluor 405, Alexa Fluor 430, Alexa Fluor 488, Alexa Fluor 500, Alexa Fluor 514, Alexa Fluor 532, Alexa Fluor 546, Alexa Fluor 555, Alexa Fluor 568, Alexa Fluor 594, Alexa Fluor 610, Alexa Fluor 633, Alexa Fluor 635, Alizarin Complexon, Alizarin Red, AMC, AMCA-S, AMCA (Aminomethylcoumarin), AMCA-X, Aminoactinomycin D, Aminocoumarin, Aminomethylcoumarin (AMCA), Anilin Blue, Anthrocyl stearate, APTRA-BTC, APTS, Astrazon Brilliant Red 4G, Astrazon Orange R, Astrazon Red 6B, Astrazon Yellow 7 GLL, Atabrine, ATTO 390, ATTO 425, ATTO 465, ATTO 488, ATTO 495, ATTO 520, ATTO 532, ATTO 550, ATTO 565, ATTO 590, ATTO 594, ATTO 610, ATTO 611X, ATTO 620, ATTO 633, ATTO 635, ATTO 647, ATTO 647N, ATTO 655, ATTO 680, ATTO 700, ATTO 725, ATTO 740, ATTO-TAG CBQCA, ATTO-TAG FQ, Auramine, Aurophosphine G, Aurophosphine, BAO 9 (Bisaminophenyloxadiazole), BCECF (high pH), BCECF (low pH), Berberine Sulphate, Bimane, Bisbenzamide, Bisbenzimide (Hoechst), bis-BTC, Blancophor FFG, Blancophor SV, BOBO-1, BOBO-3, Bodipy 492/515, Bodipy 493/503, Bodipy 500/510, Bodipy 505/515, Bodipy 530/550, Bodipy 542/563, Bodipy 558/568, Bodipy 564/570, Bodipy 576/589, Bodipy 581/591, Bodipy 630/650-X, Bodipy 650/665-X, Bodipy 665/676, Bodipy Fl, Bodipy FL ATP, Bodipy Fl-Ceramide, Bodipy R6G, Bodipy TMR, Bodipy TMR-X conjugate, Bodipy TMR-X, SE, Bodipy TR, Bodipy TR ATP, Bodipy TR-X SE, BO-PRO-1, BO-PRO-3, Brilliant Sulphoflavin FF, BTC, BTC-5N, Calcein, Calcein Blue, Calcium Crimson, Calcium Green, Calcium Green-1 Ca²⁺ Dye, Calcium Green-2 Ca²⁺, Calcium Green-5N Ca²⁺, Calcium Green-C18 Ca^(2′), Calcium Orange, Calcofluor White, Carboxy-X-rhodamine (5-ROX), Cascade Blue, Cascade Yellow, Catecholamine, CCF2 (GeneBlazer), CFDA, Chromomycin A, Chromomycin A, CL-NERF, CMFDA, Coumarin Phalloidin, CPM Methylcoumarin, CTC, CTC Formazan, Cy2, Cy3.1 8, Cy3.5, Cy3, Cy5.1 8, cyclic AMP Fluorosensor (FiCRhR), Dabcyl, Dansyl, Dansyl Amine, Dansyl Cadaverine, Dansyl Chloride, Dansyl DHPE, Dansyl fluoride, DAPI, Dapoxyl, Dapoxyl 2, Dapoxyl 3′ DCFDA, DCFH (Dichlorodihydrofluorescein Diacetate), DDAO, DHR (Dihydorhodamine 123), Di-4-ANEPPS, Di-8-ANEPPS (non-ratio), DiA (4-Di-16-ASP), Dichlorodihydrofluorescein Diacetate (DCFH), DiD—Lipophilic Tracer, DiD (DiIC18(5)), DIDS, Dihydorhodamine 123 (DHR), DiI (DiIC18(3)), Dinitrophenol, DiO (DiOC18(3)), DiR, DiR (DiIC18(7)), DM-NERF (high pH), DNP, Dopamine, DTAF, DY-630-NHS, DY-635-NHS, DyLight 405, DyLight 488, DyLight 549, DyLight 633, DyLight 649, DyLight 680, DyLight 800, ELF 97, Eosin, Erythrosin, Erythrosin ITC, Ethidium Bromide, Ethidium homodimer-1 (EthD-1), Euchrysin, EukoLight, Europium (III) chloride, Fast Blue, FDA, Feulgen (Pararosaniline), FIF (Formaldehyd Induced Fluorescence), FITC, Flazo Orange, Fluo-3, Fluo-4, Fluorescein (FITC), Fluorescein Diacetate, Fluoro-Emerald, Fluoro-Gold (Hydroxystilbamidine), Fluor-Ruby, FluorX, FM 1-43, FM 4-46, Fura Red (high pH), Fura Red/Fluo-3, Fura-2, Fura-2/BCECF, Genacryl Brilliant Red B, Genacryl Brilliant Yellow 10GF, Genacryl Pink 3G, Genacryl Yellow SGF, GeneBlazer (CCF2), Gloxalic Acid, Granular blue, Haematoporphyrin, Hoechst 33258, Hoechst 33342, Hoechst 34580, HPTS, Hydroxycoumarin, Hydroxystilbamidine (FluoroGold), Hydroxytryptamine, Indo-1, high calcium, Indo-1, low calcium, Indodicarbocyanine (DiD), Indotricarbocyanine (DiR), Intrawhite Cf, JC-1, JO-JO-1, JO-PRO-1, LaserPro, Laurodan, LDS 751 (DNA), LDS 751 (RNA), Leucophor PAF, Leucophor SF, Leucophor WS, Lissamine Rhodamine, Lissamine Rhodamine B, Calcein/Ethidium homodimer, LOLO-1, LO-PRO-1, Lucifer Yellow, Lyso Tracker Blue, Lyso Tracker Blue-White, Lyso Tracker Green, Lyso Tracker Red, Lyso Tracker Yellow, LysoSensor Blue, LysoSensor Green, LysoSensor Yellow/Blue, Mag Green, Magdala Red (Phloxin B), Mag-Fura Red, Mag-Fura-2, Mag-Fura-5, Mag-Indo-1, Magnesium Green, Magnesium Orange, Malachite Green, Marina Blue, Maxilon Brilliant Flavin 10 GFF, Maxilon Brilliant Flavin 8 GFF, Merocyanin, Methoxycoumarin, Mitotracker Green FM, Mitotracker Orange, Mitotracker Red, Mitramycin, Monobromobimane, Monobromobimane (mBBr-GSH), Monochlorobimane, MPS (Methyl Green Pyronine Stilbene), NBD, NBD Amine, Nile Red, Nitrobenzoxadidole, Noradrenaline, Nuclear Fast Red, Nuclear Yellow, Nylosan Brilliant lavin EBG, Oregon Green, Oregon Green 488-X, Oregon Green, Oregon Green 488, Oregon Green 500, Oregon Green 514, Pacific Blue, Pararosaniline (Feulgen), PBFI, Phloxin B (Magdala Red), Phorwite AR, Phorwite BKL, Phorwite Rev, Phorwite RPA, Phosphine 3R, PKH26 (Sigma), PKH67, PMIA, Pontochrome Blue Black, POPO-1, POPO-3, PO-PRO-1, PO-PRO-3, Primuline, Procion Yellow, Propidium lodid (PI), PyMPO, Pyrene, Pyronine, Pyronine B, Pyrozal Brilliant Flavin 7GF, QSY 7, Quinacrine Mustard, Resorufin, RH 414, Rhod-2, Rhodamine, Rhodamine 110, Rhodamine 123, Rhodamine 5 GLD, Rhodamine 6G, Rhodamine B, Rhodamine B 200, Rhodamine B extra, Rhodamine BB, Rhodamine BG, Rhodamine Green, Rhodamine Phallicidine, Rhodamine Phalloidine, Rhodamine Red, Rhodamine WT, Rose Bengal, S65A, S65C, S65L, S65T, SBFI, Serotonin, Sevron Brilliant Red 2B, Sevron Brilliant Red 4G, Sevron Brilliant Red B, Sevron Orange, Sevron Yellow L, SITS, SITS (Primuline), SITS (Stilbene Isothiosulphonic Acid), SNAFL calcein, SNAFL-1, SNAFL-2, SNARF calcein, SNARF1, Sodium Green, SpectrumAqua, SpectrumGreen, SpectrumOrange, Spectrum Red, SPQ (6-methoxy-N-(3-sulfopropyl)quinolinium), Stilbene, Sulphorhodamine B can C, Sulphorhodamine Extra, SYTO 11, SYTO 12, SYTO 13, SYTO 14, SYTO 15, SYTO 16, SYTO 17, SYTO 18, SYTO 20, SYTO 21, SYTO 22, SYTO 23, SYTO 24, SYTO 25, SYTO 40, SYTO 41, SYTO 42, SYTO 43, SYTO 44, SYTO 45, SYTO 59, SYTO 60, SYTO 61, SYTO 62, SYTO 63, SYTO 64, SYTO 80, SYTO 81, SYTO 82, SYTO 83, SYTO 84, SYTO 85, SYTOX Blue, SYTOX Green, SYTOX Orange, Tetracycline, Tetramethylrhodamine (TAMRA), Texas Red, Texas Red-X conjugate, Thiadicarbocyanine (DiSC3), Thiazine Red R, Thiazole Orange, Thioflavin 5, Thioflavin S, Thioflavin TCN, Thiolyte, Thiozole Orange, Tinopol CBS (Calcofluor White), TMR, TO-PRO-1, TO-PRO-3, TO-PRO-5, TOTO-1, TOTO-3, TRITC (tetramethylrodamine isothiocyanate), True Blue, TruRed, Ultralite, Uranine B, Uvitex SFC, WW 781, X-Rhodamine, XRITC, Xylene Orange, Y66F, Y66H, Y66W, YO-PRO-1, YO-PRO-3, YOYO-1, YOYO-3, SYBR Green, Thiazole orange (interchelating dyes), or combinations thereof.

In some aspects, the nucleotides can be used to study a sample, such as a biological sample. For instance, the nucleotides may be used to determine nucleic acids within a cell or other sample. The sample may include a cell culture, a suspension of cells, a biological tissue, a biopsy, an organism, or the like. The sample may also be cell-free but nevertheless contain nucleic acids. If the sample contains a cell, the cell may be a human cell, or any other suitable cell, e.g., a mammalian cell, a fish cell, an insect cell, a plant cell, or the like. More than one cell may be present in some cases.

The nucleic acids to be determined may be, for example, DNA, RNA, or other nucleic acids that are present within a cell (or other sample). The nucleic acids may be endogenous to the cell, or added to the cell. For instance, the nucleic acid may be viral, or artificially created. In some cases, the nucleic acid to be determined may be expressed by the cell. The nucleic acid is RNA in some embodiments. The RNA may be coding and/or non-coding RNA. Non-limiting examples of RNA that may be studied within the cell include mRNA, siRNA, rRNA, miRNA, tRNA, lncRNA, snoRNAs, snRNAs, exRNAs, piRNAs, or the like. In some embodiments, for example, at least some of the plurality of oligonucleotides are complementary to a portion of a specific chromosome sequence, e.g., of a human chromosome.

In some cases, a significant portion of the nucleic acid within the cell may be studied. For instance, in some cases, enough of the RNA present within a cell may be determined so as to produce a partial or complete transcriptome of the cell. In some cases, at least 4 types of mRNAs are determined within a cell, and in some cases, at least 3, at least 4, at least 7, at least 8, at least 12, at least 14, at least 15, at least 16, at least 22, at least 30, at least 31, at least 32, at least 50, at least 63, at least 64, at least 72, at least 75, at least 100, at least 127, at least 128, at least 140, at least 255, at least 256, at least 500, at least 1,000, at least 1,500, at least 2,000, at least 2,500, at least 3,000, at least 4,000, at least 5,000, at least 7,500, at least 10,000, at least 12,000, at least 15,000, at least 20,000, at least 25,000, at least 30,000, at least 40,000, at least 50,000, at least 75,000, or at least 100,000 types of mRNAs may be determined within a cell.

In some cases, the transcriptome of a cell may be determined. It should be understood that the transriptome generally encompasses all RNA molecules produced within a cell, not just mRNA. Thus, for instance, the transcriptome may also include rRNA, tRNA, etc. In some embodiments, at least 5%, at least 10%, at least 15%, at least 20%, at least 25%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, or 100% of the transcriptome of a cell may be determined.

The determination of one or more nucleic acids within the cell or other sample may be qualitative and/or quantitative. In addition, the determination may also be spatial, e.g., the position of the nucleic acid within the cell or other sample may be determined in two or three dimensions. In some embodiments, the positions, number, and/or concentrations of nucleic acids within the cell (or other sample) may be determined.

One non-limiting example of such as system may be found in U.S. Provisional Patent Application Ser. No. 62/031,062, filed Jul. 30, 2014, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al. incorporated herein by reference in its entirety.

The following documents are each incorporated herein by reference in their entireties: U.S. Pat. No. 7,838,302, issued Nov. 23, 2010, entitled “Sub-Diffraction Limit Image Resolution and Other Imaging Techniques,” by Zhuang, et al.; U.S. Pat. No. 8,564,792, issued Oct. 22, 2013, entitled “Sub-diffraction Limit Image Resolution in Three Dimensions,” by Zhuang, et al.; and Int. Pat. Apl. Pub. No. WO 2013/090360, published Jun. 20, 2013, entitled “High Resolution Dual-Objective Microscopy,” by Zhuang, et al. In addition, incorporated herein by reference in their entireties are U.S. Provisional Patent Application Ser. No. 62/031,062, filed Jul. 30, 2014, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al.; U.S. Provisional Patent Application Ser. No. 62/050,636, filed Sep. 15, 2014, entitled “Probe Library Construction,” by Zhuang, et al.; U.S. Provisional Patent Application Ser. No. 62/142,653, filed Apr. 3, 2015, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al.; and a PCT application filed on even date herewith, entitled “Systems and Methods for Determining Nucleic Acids,” by Zhuang, et al.

The following examples are intended to illustrate certain embodiments of the present invention, but do not exemplify the full scope of the invention.

EXAMPLE 1

This example illustrates high-throughput hybridization construction of DNA probes, according to certain embodiments of the invention.

Overview. This protocol uses complex libraries of oligonucleotides as templates for the enzymatic construction of large quantities of single-stranded DNA molecules that can be chemically labeled and which are designed to be hybridized to specific sets of nucleic acid targets that can vary significantly in complexity, i.e. the number of unique target sequences. The use of this protocol involves several basic steps: 1) computational design and optimization of a set of oligonucleotide sequences that will serve as the hybridization regions on the sample of interest; 2) computational design and optimization of a large set of short oligonucleotide sequences to serve as highly specific PCR primers; 3) computational construction of the template molecules; 4) synthesis of the template molecules to create the template library; 5) selection of a sub-set of in vitro template molecules from a template library via PCR; 6) in-vitro-transcription-based amplification of the in vitro template molecules into RNA, which serves as the final template for the hybridization probes; 7) reverse transcription of the RNA back into DNA using chemically-modified primers; 8) removal of the RNA via alkaline hydrolysis; and 9) purification of the chemically-modified ssDNA molecules, called probes in this example. These steps are discussed in detail below.

Construction of Probes. Selection of Hybridization Regions. The software OligoArray 2.0 was used to design a large set of potential hybridization regions in the E. coli transcriptome in this example. Briefly, this software selects hybridization regions that lie within user-specified ranges for length and melting temperature. This software also screens for off-target hybridization as well as potential secondary structure. This software was used to generate hybridization regions of 30-nt length, a melting-temperature range of 80-85° C., a GC content of between 50-60%, and a secondary structure melting threshold and cross-hybridization-melting temperature of 75° C. using all annotated, transcribed mRNAs in E. coli (K-12 mg1655; NC90013.2).

Design of Hundreds of Orthogonal Primers. The general hybridization probe set may require far fewer unique sequences than is provided in the complex sets of oligonucleotides typically generated by array-based synthesis—the oligopool. To exploit this disparity in complexity and significantly lower the cost per experiment, this example uses a method to embed a large number of unique template sets within a single oligopool. Briefly, each template molecule was flanked by a unique pair of PCR primers common only to the template molecules for all probes within a given set. To facilitate the embedding of hundreds of unique probe template sets within a single oligopool, a protocol was created for constructing hundreds of orthogonal PCR primers.

Specifically, the protocol starts by truncating the members of an existing library of 240,000, semi-orthogonal, 25-mer oligonucleotides to 20-nt length and selecting oligos on parameters optimal for PCR: a narrow range of predicted melting-temperature (65-70° C.) and GC content (50-60%); the absence of contiguous runs of the same base longer than 4, i.e. AAAA; and the presence of a 3′ GC-clamp, i.e. 2-3 G/C within the final 5 nt. BLAST was then run with these optimized primers against all annotated RNAs in the E. coli transcriptome as well as the T7 promoter (TAATACGACTCACTATAGGG) (SEQ ID NO. 1) and a common priming region (P9: CAGGCATCCGAGAGGTCTGG) (SEQ ID NO. 2) and potential primers with hits to the transcriptome with 12-nt or longer of homology, or with a hit of any length within 3-nt of the 3′ end of the potential primer, were removed. Finally, the remaining primers were screened for homology to each other, again using BLAST. Any primer with longer than 11-nt of homology or with any homology within 3-nt of the 3′ end of another primer was removed. These cuts reduced the original 240,000 oligos to 198 highly optimized primers for the E. coli transcriptome. The final set of 198 primers are listed in FIGS. 3A-3D.

If required, more primers could be generated using established techniques to create a larger set of initial oligonucleotides, or by relaxing the stringency of the cuts described above. Finally, by changing the transcriptome used to screen the primers, this approach can also be generalized to the generation of optimal index primers for any organism.

Template Construction. To design the template libraries used to create our probe sets, the desired RNA targets were first selected to be stained simultaneously. Individual template molecules were designed by concatenating the following sequences: i) the first of two unique primers for the appropriate mRNA group, ii) the common primer P9, iii) the site for the nicking enzyme Nb.BsmI, iv) the reverse complement of the hybridization region to the target, v) the reverse complement of the nicking enzyme Nb.BsrDI, and vi) the reverse complement of the second unique primer for the given mRNA group. FIG. 2 demonstrates this organization. Multiple probe template sets were combined into large oligopools and these pools were synthesized via CustomArray.

FIG. 2 shows an example template sequence containing a probe the mRNA, acnB. Underlined at the beginning is the sequence of the first primer, not underlined is the common P9 priming site, then underlined is the Nb.Bsml site, not underlined is the reverse complement to the hybridization region for acnB, next underlined is the reverse complement of the Nb.BsrDI site, and the final not underlined portion is the second unique primer. This template is one of 736 used to create probes to stain all mRNAs expressed in the copy number range of 1-10 per cell transcribed from the E. coli genomic locus corresponding to base pairs 1-100 kb. See FIGS. 4A-4BV for the sequences of the 736 probes.

Index PCR. The template for specific probe sets were selected from the complex oligopool via limited-cycle PCR. 0.5 to 1 ng of the complex oligopool was combined with 0.5 micromolar of each primer. The forward primer matched the priming sequence for the desired sub set while the reverse primer was a 5′concatenation of this sequence with the T7 promoter. To avoid the generation of G-quadruplets, which can be difficult to synthesize, the terminal Gs required in the T7 promoter were generated from Gs located at the 5′ of the priming region where appropriate. All primers were synthesized by IDT. A 50 microliter reaction volume was amplified either using the KAPA real-time library amplification kit (KAPA Biosystems; KK2701) or via a homemade qPCR mix which included 0.8× EvaGreen (Biotum; 31000-T) and the hot-start Phusion polymerase (New England Biolabs; M0535S). Amplification was followed in real time using Agilent's MX300P or Biorad's CFX Connect. Individual samples were removed before the plateau in amplification, often at concentrations about 10-fold lower than would correspond to this plateau, to minimize distortion of template abundance due to over-amplification. Individual templates were purified with columns according to the manufacturer's instructions (Zymo DNA Clean and Concentrator; D4003) and eluted in RNase-free deionized water.

Amplification via in vitro transcription. The template was then amplified via in vitro transcription. Briefly, 0.5 to 1 micrograms of template DNA was amplified into 100 to 200 micrograms of RNA in a single 20-30 microliter reaction with a high yield RNA polymerase (New England Biolabs; E2040S). Reactions were supplemented with 1× RNase inhibitor (Promega RNasin; N2611). Amplification was typically run for 2 to 4 hours at 37° C. to maximize the yield. The RNA was not purified after the reaction and was either stored at −80° C. or immediately converted into DNA as described below.

Reverse Transcription. 1-2 nmol of fluorescently-labeled ssDNA probe was created from the above in vitro transcription reactions using the reverse transcriptase Maxima H-(Thermo Scientific; EP0751). This enzyme was used because of its higher processivity and temperature resistance, which allowed the conversion of large quantities of RNA into DNA within small volumes at temperatures that disfavor secondary structure formation. The unpurified RNA created above was supplemented with 1.6 mM of each dNPT, 1-2 nmol of fluorescently labeled P9 primer, 300 units of Maxima H-, 60 units of RNasin, and a final 1× concentration of the Maxima RT buffer. The final 75 microliter volume was incubated at 50° C. for 60 minutes.

Strand Selection and Purification. The template RNA in the reaction above was then removed from the DNA via alkaline hydrolysis. 75 microliters of a solution of 0.25 M EDTA and 0.5 N NaOH was added to each reverse transcription reaction, and the sample was incubated at 95° C. for 10 minutes. The reaction was immediately neutralized by purifying the ssDNA probe with a modified version of the Zymo Oligo Clean and Concentrator protocol. Specifically, the 5-microgram capacity column was replaced with a 100-microgram capacity DNA column as appropriate. The remainder of the protocol was run according to the manufacturer's instructions. Probe was eluted in 100 microliter RNase-free deionized water and evaporated in a vacuum concentrator. The final pellet was resuspend in 10 microliter RNase-free water and stored at −20° C. Denaturing poly-acrylimid gel electrophoresis and absorption spectroscopy revealed that this protocol typically produces 90-100% incorporate of the fluorescent primer into full length probe and 60-75% recovery of the total fluorescent probe. Thus, without exceeding a 150-microliter reaction volume, this protocol can be used to create ˜2 nmol of fluorescent probe. The small reaction volumes were conducive to the use of high-throughput fluid handling techniques and significantly lower the cost of this reaction as compared to alternative approaches. Thus, 24-96 probes could be constructed in parallel, with minimal hands-on time, across two days, for a final cost of ˜$14 per 2 nmol of each probe set.

While several embodiments of the present invention have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the functions and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the present invention. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings of the present invention is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the invention described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, the invention may be practiced otherwise than as specifically described and claimed. The present invention is directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the scope of the present invention.

All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms.

The indefinite articles “a” and “an,” as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean “at least one.”

The phrase “and/or,” as used herein in the specification and in the claims, should be understood to mean “either or both” of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with “and/or” should be construed in the same fashion, i.e., “one or more” of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the “and/or” clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to “A and/or B”, when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.

As used herein in the specification and in the claims, “or” should be understood to have the same meaning as “and/or” as defined above. For example, when separating items in a list, “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of” or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used herein shall only be interpreted as indicating exclusive alternatives (i.e. “one or the other but not both”) when preceded by terms of exclusivity, such as “either,” “one of,” “only one of,” or “exactly one of.” “Consisting essentially of,” when used in the claims, shall have its ordinary meaning as used in the field of patent law.

As used herein in the specification and in the claims, the phrase “at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase “at least one” refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, “at least one of A and B” (or, equivalently, “at least one of A or B,” or, equivalently “at least one of A and/or B”) can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

It should also be understood that, unless clearly indicated to the contrary, in any methods claimed herein that include more than one step or act, the order of the steps or acts of the method is not necessarily limited to the order in which the steps or acts of the method are recited.

In the claims, as well as in the specification above, all transitional phrases such as “comprising,” “including,” “carrying,” “having,” “containing,” “involving,” “holding,” “composed of,” and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of” shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03. 

1-72. (canceled)
 73. A method, comprising: simultaneously amplifying at least some of a plurality of oligonucleotides in a common solution using PCR to produce amplified oligonucleotides; transcribing in vitro at least some of the amplified oligonucleotides to produce RNA; reverse transcribing the RNA to produce transcribed DNA; and selectively degrading the RNA relative to the transcribed DNA.
 74. The method of claim 73, wherein the plurality of oligonucleotides have an average length of between 10 and 200 nucleotides.
 75. The method of claim 73, wherein the plurality of oligonucleotides includes at least 10 unique oligonucleotide sequences. 76-79. (canceled)
 80. The method of claim 73, wherein amplifying at least some of the plurality of oligonucleotides comprises exposing at least some of the plurality of oligonucleotides to primer-containing sequences.
 81. (canceled)
 82. The method of claim 80, wherein at least some of the amplified oligonucleotides contain a promoter. 83-85. (canceled)
 86. The method of claim 73, wherein at least some of the plurality of oligonucleotides comprise at least a first set of oligonucleotides having a first common index region, and a second set of oligonucleotides having a second common index region distinguishable from the first common index region.
 87. The method of claim 86, comprising amplifying the first set of oligonucleotides but not the second set of oligonucleotides.
 88. The method of claim 86, wherein the plurality of oligonucleotides comprises at least 2 sets of oligonucleotides having distinguishable common index regions. 89-95. (canceled)
 96. The method of claim 73, wherein transcribing at least some of the amplified oligonucleotides comprises a mass of RNA that is at least 100-fold greater than the mass of amplified oligonucleotides.
 97. The method of claim 73, wherein transcribing at least some of the amplified oligonucleotides comprises exposing the amplified oligonucleotides to an RNA polymerase. 98-102. (canceled)
 103. The method of claim 73, wherein transcribing at least some of amplified oligonucleotides to produce RNA comprises producing, on average, at least 10 RNA copies of each of the amplified oligonucleotides. 104-106. (canceled)
 107. The method of claim 73, wherein reverse transcribing the RNA comprises exposing the RNA to a reverse transcriptase.
 108. (canceled)
 109. The method of claim 73, wherein reverse transcribing the RNA to produce transcribed DNA occurs without first purifying the RNA from components used to produce the RNA.
 110. The method of claim 73, further comprising purifying the RNA from components used to produce the RNA prior to reverse transcribing the RNA to produce transcribed DNA.
 111. The method of claim 73, wherein reverse transcribing the RNA to produce transcribed DNA comprises reverse transcribing the RNA to produce transcribed DNA using a sequence containing a transcription primer.
 112. The method of claim 111, wherein the sequence containing a transcription primer is incorporated into the transcribed DNA. 113-122. (canceled)
 123. The method of claim 73, wherein selectively degrading the RNA relative to the transcribed DNA comprises chemically reducing the RNA. 124-126. (canceled)
 127. The method of claim 73, wherein the transcribed DNA is substantially single-stranded. 128-140. (canceled)
 141. The method of any one of claim 73, wherein each oligonucleotide of a subset of the oligonucleotides comprises an index portion that is identical.
 142. (canceled)
 143. The method of claim 73, wherein the plurality of oligonucleotides has a distribution of lengths such that no more than 10% of the oligonucleotides has a length that is less than 80% or greater than 120% of the overall average length of the plurality of nucleotides. 